12-SD Samantaray

نویسندگان

  • B Clarke
  • J.-H Chu
چکیده

SUMMARY Consider a regression problem in which there are many more explanatory variables than data points, i.e., p >> n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an L q norm with high enough q. as the least squares estimator of β, provided the inverse exists. If |X'X| is small, the inverse is large in the sense that some of its eigenvalues must be large. When p > n, X is n × p, i.e., short and fat. For Short Fat Data (SFD) |X'X| = 0 so its inverse fails to exist. The central issue here is that the mean function for Y, EY, is in a space of dimension p while only n < p data points are available. That is, the SFD or ‘large p, small n’ problem would disappear if we had more data. However, even though one can imagine arbitrarily large n’s, in practice they do not exist. Alternatively, we can try to do effective dimension reduction by regressing Y on functions of the X i ’s. The idea is that if we evaluate a comparatively small number of suitably chosen functions on each X i , i.e., features, and then do penalized regression on those features we will have retained all the information in the data about the response Y. The question is what kind of statistics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal PID Insulin Injection Control For Blood Glucose Regulation in IDDM Patient

This paper address the design of output feedback PID controller to deliver insulin via an implantable micro insulin dispenser for insulin dependent diabetes mellitus (IDDM) patients. For synthesis of the controller, a 9 order linear state space model of the multivariable nonlinear dynamic glucose insulin process of the IDDM patient has been used. The performance of the resulting controller was ...

متن کامل

oro - genital contact : case report . meningitidis group A acquired by Acute urethritis due to Neisseria

tics without improvement. We treated her successfully with local applications of piperazine solution for only eight days, and more than 18 months elapsed since without any recurrence. In chronic cases of sterile pyuria, therefore, urine should be examined for parasites. If the ova or larvae of E vermicularis are found treatment should be by local irrigation ofthe urethra and bladder and vaginal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014